home *** CD-ROM | disk | FTP | other *** search
- Path: newsource.ihug.co.nz!usenet
- From: hoggy@ihug.co.nz (John Hogg)
- Newsgroups: comp.lang.c++
- Subject: I need help with a data structure
- Date: 5 Apr 1996 06:43:20 GMT
- Organization: The Internet Group
- Message-ID: <4k2fe8$cb9@newsource.ihug.co.nz>
- NNTP-Posting-Host: hoggy.ihug.co.nz
- X-Newsreader: WinVN 0.92.1
-
- Hi, I'm quite new to programming so I would like some advice on a program
- I'm going to write in C++.
-
- I have a number of chapters for a book, stored as text files (ie. all
- ASCII characters). I need to 'read' through these chapters, then store
- and display the new words that occur in each chapter so that a glossary
- can be made for the key words. I also have a text file that contains
- common words that should be omitted (ie. "a", "the" etc). There are no
- more than 1,000,000 words in each chapter and I estimate there will be
- no more than 30,000 different words.
-
- I have decided I should hash the words from the common word file into an
- array. Then, each word I read in from the chapters could be checked
- against this array and if it is not there I would hash it into another
- array of 30,000 items, discarding it if it is already there. Then this
- new word could be added to a binary tree to sort the words into
- alphabetical order for storage and display purposes.
-
- There is the added problem of words such as 'sweating' and æsweatÆ where
- the word æsweatÆ would only be required. I could simply check if a word
- has 'ing' on the end and remove it but then 'dancing' would become 'danc'.
- Maybe it would be quicker to display all the words and get the user to
- delete the words not required.
-
- I would appreciate any help you can give me.
-